Skip to content

Conversation

@wgtmac
Copy link
Member

@wgtmac wgtmac commented Dec 22, 2025

Refactors the ManifestReader implementation to support advanced data skipping
and projection pushdown, and adds full test coverage ported from the Java implementation.

Implementation Changes:

  • Consolidate ManifestReader logic: Move implementation from manifest_reader_internal.cc
    to manifest_reader.cc and remove the obsolete internal class.
  • Fluent API: Add Select(), FilterPartitions(), and FilterRows() to support
    column projection and expression-based filtering (partition & metrics).
  • Optimization: Support lazy initialization of the underlying Avro reader.
  • ManifestEntry: Update schema handling to support reading specific columns and stats.

Test Coverage:

  • Add manifest_reader_test.cc: Ports TestManifestReader from Java, covering:
    • Basic reading and partition filtering.
    • V2/V3 specific features: Delete Files and Deletion Vectors (DVs).
    • Invalid usage validation (missing snapshot ID).
  • Add manifest_reader_stats_test.cc: Ports TestManifestReaderStats, verifying:
    • Metrics projection logic (stats columns included/dropped based on filter/select).
    • Exact match of stats parsing against expected values.
  • Parameterized Testing: All tests run against Format Versions 1, 2, and 3 to ensure cross-version compatibility.

@wgtmac wgtmac force-pushed the enhance_manifest_reader branch 3 times, most recently from 537a136 to d4f7c24 Compare December 23, 2025 04:27
@wgtmac wgtmac marked this pull request as ready for review December 23, 2025 04:30
@wgtmac wgtmac force-pushed the enhance_manifest_reader branch from d4f7c24 to 81b1732 Compare December 23, 2025 04:35
- Consolidate `ManifestReader` implementation into `manifest_reader.cc`
  and remove `manifest_reader_internal.cc`.
- Implement fluent API for column selection, partition filtering, and
  row filtering.
- Support lazy initialization of the underlying Avro reader.
- Add various filtering support for entries.
@wgtmac wgtmac force-pushed the enhance_manifest_reader branch from 81b1732 to 5b11c46 Compare December 23, 2025 04:39
@wgtmac wgtmac changed the title feat: add projection and filtering to manifest reader feat: enhance ManifestReader with projection, filtering and comprehensive tests Dec 23, 2025
@wgtmac wgtmac changed the title feat: enhance ManifestReader with projection, filtering and comprehensive tests feat: enhance ManifestReader with projection and filtering support Dec 23, 2025
@wgtmac
Copy link
Member Author

wgtmac commented Dec 24, 2025

Thanks for the review! I have addressed all comments. Let me know what you think. @WZhuo @zhjwpku @HuaHuaY @shangxinli

@wgtmac
Copy link
Member Author

wgtmac commented Dec 25, 2025

Thanks all for the review! Let me merge it.

@wgtmac wgtmac merged commit f43d24b into apache:main Dec 25, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants